1 Introduction

Determining individuals most susceptible to a disease allows productive resource allocation. For diseases such as Dementia, individuals both inherit risk factors and accrue them throughout life. No factor is causative on its own, but understanding what contributes to a high risk allows the public health sector to assess and prevent potential health crises. Dementia is a clinical syndrome characterized by difficulties in memory and language, psychological and psychiatric changes, and impairments in activities of daily life (Burns and Iliffe 2009). Dementia’s complex list of possible symptoms are reflected in its causes. Common origins of dementia can be degenerative neurological diseases such as Parkinson’s or Alzheimer’s; however vascular disorders in the brain, traumatic head injuries and some infections can lead to a dementia diagnosis.

The data used in analysis is attained from a longitudinal study of 150 participants. Participants were right handed, either male of female and aged between 60 and 96. They were characterized as either nondemented, demented or converted (became demented throughout the course of the study). For each session, participants took part in T1 weighted MRI scans, the results of which are recorded in the data set. Participants underwent 2 or more sessions, each separated by at least a year.

This work flow aims to look at two questions. What factors are associated with an increased risk of dementia and what factors are associated with an increased risk over time. It is important to note, no determinant causes dementia. The profiles of two people characterized as suffering with dementia maybe completely different.

2 Methods

Work flow is produced with R, a statistical computing language, (R Core Team 2020) and R Markdown which generates this html report.(Allaire et al. 2020). The bookdown package is used to add features to R Markdown such as cross referencing (Xie 2016).

Data is imported using R, the tidyverse (Wickham et al. 2019) and readxl (Wickham and Bryan 2019) packages.

2.1 Data Description

Raw data is two excel sheets within the same spreadsheet dementia.xlsx. One sheet is visit data, containing information regarding amount of visits and MRIs as well as numerical results. The second sheet, patient data, contains information on current dementia status, sex, and education and social status. Each row is one patient’s data at one given time. Replicate subject_IDs can be seen as some patients had data collected once a year over a course of multiple years. Explanations of each column can be seen in 2.1.

Table 2.1: Key Terms Table
Term Definition
MMSE Mini-Mental State Examination score (range: 0 = worst to 30 = best). A 30 point questionairre used to measure cognitive impairement. A score above 24 is considered normal. Lower scores may correlate with dementia although this is not true in every case.
CDR Clinical Dementia rating (0 = no impairment, 0.5 = questionable, 1 = mild, 2 = moderate, 3 = severe). A clinical tool that measures relative dementia symptoms based on 6 domains (memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care)
eTIV Estimated total intracranical volume (mm3)
nWBV Normalized whole-brain volume (%)
ASF Atlas Scaling Factor (unitless)
M_F Patient sex, Female is reprsesnted by a 1, Male is reprsented by a 2
EDUC Years of Education
SES Socioeconomic status, assesed by Hollingshead Four Factor Index Of Social Status, measures the social status of an indvidual based on 4 domains: marital status, retired/employed status, educational attainment, and occupational prestige. A score of 1 indicates high status, while 5 indictaes lowest status

2.2 Data Transformation

3 data sets were created with the raw data. Each one starts by merging visit_data and patient_data into one set by subject ID. Post import and merging, data variable names are cleaned with the janitor (Firke 2020) package. From here they differ as described below:

  1. Dementia: used to look at which factors are associated with an increased risk of dementia. Columns not used in analysis (subject_id, visit, group and mri_number) and rows with NA values are removed. Finally the values in m_f have been converted to numerical values for analysis (F = 1, M = 2).

  2. Dementia2: used to look at factors contributing to dementia over time. The aim of this data set was use in either a paired Student’s t-test or paired samples Wilcoxon test. In this data set, the cdr and mri_number columns and NA rows were removed. The rows are rearranged into visit number ascending order. The values in m_f have been converted to numerical values for analysis (F = 1, M = 2). The Nondemented and Demented rows from the group column are removed as these did not change over time. Only visits 1 and 2 are kept, for most subjects there was no data for visit number 3 or higher. OAS2_ are removed from the subject_id strings. Unique subject_ids were removed as they do not have pairs. Finally the visit levels were ordered, purely to have start and end in order in the boxplots.

  3. Dementia_extract: used to generate some of the values used for inline reporting. In this data set all repeated subject_ids were removed so that accurate numbers about the number of participants could be recorded.

3 Which determinants correlate with increased CDR?

Using the dementia data set, this section looks at which determinants correlate with a high clinical dementia rating (CDR). In other words which determinants are linked with dementia. A list and explanation of the determinants used in this analysis can be seen in 2.1.

3.1 Scatter Plots

Plots are generated using ggplot2 from the tidyverse package (Wickham et al. 2019). Arrangement of plots into a grid was achieved using ggarrange from the ggpubr package (Kassambara 2020).
\label{fig:figs}Scatter Plots That Demonstrate Correlations Between Determinant And A High CDR

Figure 3.1: Scatter Plots That Demonstrate Correlations Between Determinant And A High CDR

3.2 Summary Table

Table generated using the kableExtra package (Zhu 2020).

Table 3.1: Summary Statitics Table
Determinant CDR Mean N Standard Deviation Standard Error Minimum Maximum
Age 0.0 77.1553398 206 8.0894478 0.5636185 60.000 97.000
Age 0.5 77.4363636 110 7.3015359 0.6961741 62.000 92.000
Age 1.0 74.3714286 35 6.8645968 1.1603286 61.000 96.000
Age 2.0 85.0000000 3 11.2694277 6.5064071 78.000 98.000
MMSE 0.0 29.2233010 206 0.9205729 0.0641394 25.000 30.000
MMSE 0.5 26.4636364 110 3.0400304 0.2898555 17.000 30.000
MMSE 1.0 20.3142857 35 5.2735267 0.8913887 4.000 30.000
MMSE 2.0 20.3333333 3 5.0332230 2.9059326 15.000 25.000
eTIV 0.0 1486.8592233 206 179.9986303 12.5410988 1106.000 2004.000
eTIV 0.5 1482.4545455 110 174.0359889 16.5936805 1143.000 1928.000
eTIV 1.0 1528.0000000 35 157.8443015 26.6805566 1274.000 1957.000
eTIV 2.0 1538.0000000 3 157.4452286 90.9010451 1401.000 1710.000
nWBV 0.0 0.7404515 206 0.0373497 0.0026023 0.644 0.837
nWBV 0.5 0.7205182 110 0.0345072 0.0032901 0.646 0.806
nWBV 1.0 0.6990571 35 0.0224564 0.0037958 0.657 0.756
nWBV 2.0 0.7066667 3 0.0503322 0.0290593 0.660 0.760
ASF 0.0 1.1971068 206 0.1405721 0.0097941 0.876 1.587
ASF 0.5 1.1995091 110 0.1365395 0.0130185 0.910 1.535
ASF 1.0 1.1600286 35 0.1146708 0.0193829 0.897 1.377
ASF 2.0 1.1490000 3 0.1146865 0.0662143 1.026 1.253
EDUC 0.0 15.1601942 206 2.7047506 0.1884489 8.000 23.000
EDUC 0.5 14.0090909 110 3.1781809 0.3030277 6.000 20.000
EDUC 1.0 14.0000000 35 2.4970571 0.4220797 8.000 20.000
EDUC 2.0 17.0000000 3 3.0000000 1.7320508 14.000 20.000
SES 0.0 2.3349515 206 1.0497116 0.0731369 1.000 5.000
SES 0.5 2.6818182 110 1.2186006 0.1161890 1.000 5.000
SES 1.0 2.5714286 35 1.2434703 0.2101848 1.000 5.000
SES 2.0 1.6666667 3 1.1547005 0.6666667 1.000 3.000

4 Influence of Individual Determinants Over A Time Period Of 2 Years

4.1 Box Plots

Plots are generated using ggplot2 from the tidyverse package (Wickham et al. 2019). Arrangement of plots into a grid was achieved using ggarrange from the ggpubr package (Kassambara 2020).
\label{fig:fig2}Boxplots Showing Deterimant Data At The Start And End Of The Study In Converted Patients.

Figure 4.1: Boxplots Showing Deterimant Data At The Start And End Of The Study In Converted Patients.

4.2 Summary Table

5 Dementia Grouping Questionnaire

In addition a LDA model and a questionnaire whose responses are fed into the model can be found here:dementia_grouping_questionnaire.Rmd. The unique packages used in this are as follows: caret (Kuhn 2020), MASS (Venables and Ripley 2002) , shiny (Chang et al. 2020) and shinyforms (Attali, n.d.). Explanation of package use can be found in the linked Rmd file. The model is trained to predict dementia grouping (demented or nondemented).

6 Discussion

Issues with data set: does not cover all factors e.g. family history no cases of 3.0 or over, only x 2.0s all right handed quite small Dementia 2: for stats test visit data was converted to having two levels instead of multiple. So this only identifies factors that increased it over time but does not specify how lomg this takes. more research could be done on this

7 Word Count

Word count is calculated using wordcountaddin (Marwick 2020).

This rmd script: 967
The dementia grouping questionnaire script: 338
The README: 342
Total: 1309

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2020. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Attali, Dean. n.d. “Shinyforms.” Github.
Burns, Alistair, and Steve Iliffe. 2009. “Dementia.” BMJ 338 (February): b75.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Kassambara, Alboukadel. 2020. Ggpubr: ’ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.
Kuhn, Max. 2020. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Marwick, Ben. 2020. Wordcountaddin: Word Counts and Readability Statistics in r Markdown Documents.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Xie, Yihui. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
Zhu, Hao. 2020. kableExtra: Construct Complex Table with ’kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.